Skip to content

Comments

fix: prevent handshake race condition causing ConnectionClosed failures#644

Merged
mergify[bot] merged 24 commits intosigp:unstablefrom
diegomrsantos:fix/handshake-connection-closed-race
Nov 3, 2025
Merged

fix: prevent handshake race condition causing ConnectionClosed failures#644
mergify[bot] merged 24 commits intosigp:unstablefrom
diegomrsantos:fix/handshake-connection-closed-race

Conversation

@diegomrsantos
Copy link
Member

@diegomrsantos diegomrsantos commented Oct 1, 2025

Issue Addressed

Fixes SSV handshake failures caused by race conditions between connection establishment and protocol stream opening.

Observed Symptoms

2025 Sep 30, @ 23:34:08.674  DEBUG  Handshake failed peer_id=16Uiu2HAmQ13kHGvufdTn7WFS28A2szS8Avk5GVBjcKJergzCd5v1  error=Outbound(ConnectionClosed)
2025 Sep 30, @ 23:34:08.663  DEBUG  Failed to dial peer err=Dial error: dial condition was configured to only happen when both disconnected and there is currently no ongoing dialing attempt, but node is already connected or dial is in progress
2025 Sep 30, @ 23:34:08.662  DEBUG  Let's dial! peer=PeerId(16Uiu2HAmQ13kHGvufdTn7WFS28A2szS8Avk5GVBjcKJergzCd5v1)

Proposed Changes

Evolution of Approach

First Attempt: Identify-based Coordination

  • Queue peers on outbound ConnectionEstablished, wait for Identify::Received, then initiate handshake
  • Why This Was Wrong: Not composable (couples Identify and Handshake), unnecessary complexity (pending_handshakes queue), Identify is optional, against libp2p patterns (behaviors should be self-contained)

Current Approach: Self-Contained Behaviour with Semantic Events

The handshake behaviour is now fully self-contained and auto-initiating:

impl NetworkBehaviour for Behaviour {
    type ToSwarm = Event;  // Emits Event::Completed or Event::Failed
    
    fn on_swarm_event(&mut self, event: FromSwarm) {
        if let FromSwarm::ConnectionEstablished(conn_est) = &event {
            if let ConnectedPoint::Dialer { .. } = conn_est.endpoint {
                if conn_est.other_established == 0 {
                    // Auto-initiate handshake on first outbound connection
                    self.inner.send_request(&conn_est.peer_id, self.node_info.clone());
                }
            }
        }
        self.inner.on_swarm_event(event);
    }
    
    fn poll(&mut self, cx: &mut Context) -> Poll<ToSwarm<Event, ...>> {
        // Process RequestResponse events internally
        // Emit high-level Event::Completed or Event::Failed
    }
}

Why This Works:

  • Composable: Self-contained with no external dependencies
  • Handles concurrent dials: other_established == 0 ensures only the FIRST connection initiates, preventing duplicates
  • Clean event emission: Custom poll() processes low-level RequestResponse events and emits semantic events
  • Simple: No coordination, no pending queues, no external state

Evidence

The concurrent dials test proves the fix:

Without other_established == 0: Both peers initiate handshakes → duplicate requests

With the check: First ConnectionEstablished (other_established == 0) initiates, second (other_established == 1) skips → only ONE handshake

Implementation Details

Behaviour Changes:

  • Custom poll() processes RequestResponse events internally and emits semantic Event::Completed/Event::Failed
  • Events queue (VecDeque<Event>) for buffering handshake results
  • Auto-initiation on first outbound connection via on_swarm_event()
  • Stores NodeInfo for automatic handshake requests

Network Changes:

  • Removed pending_handshakes queue and manual initiation
  • Updated to handle semantic events directly
  • Changed dial policy from NotDialing to DisconnectedAndNotDialing
  • Simplified connection handling

Test Improvements:

  • All tests use drive() for clean, declarative testing
  • Reduced from 611 to 397 lines (35% reduction)
  • Helper functions eliminate duplication

Additional Info

Architecture

Follows idiomatic rust-libp2p patterns where behaviours are self-contained and composable. The behaviour:

  1. Handles its own coordination without external state
  2. Emits high-level semantic events for consumers
  3. Processes low-level protocol details internally in poll()
  4. Has no dependencies on other behaviours

This is the same pattern used in Lighthouse and other production rust-libp2p implementations.

Compatibility

The asymmetric dialer-initiates pattern matches Go SSV's handshake flow, ensuring proper interoperability.

@diegomrsantos diegomrsantos self-assigned this Oct 1, 2025
@diegomrsantos diegomrsantos added bug Something isn't working network v1.0.0 First Mainnet-release labels Oct 1, 2025
@diegomrsantos diegomrsantos force-pushed the fix/handshake-connection-closed-race branch from 33e1d3d to 66a2b48 Compare October 1, 2025 13:00
@diegomrsantos diegomrsantos requested a review from Copilot October 1, 2025 18:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes handshake race conditions in the SSV network implementation by deferring handshake initiation until after the Identify protocol confirms connection stability and protocol support.

Key Changes:

  • Queue outbound handshakes instead of initiating immediately on connection establishment
  • Wait for Identify protocol completion before starting handshakes to ensure stable connections
  • Clean up pending handshakes on connection errors and closures

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
anchor/network/src/peer_manager/discovery.rs Changed dial condition from DisconnectedAndNotDialing to NotDialing to reduce noise
anchor/network/src/network.rs Added pending handshake queue, deferred initiation logic, and enhanced connection event handling
anchor/network/src/handshake/mod.rs Added documentation and trace logging for handshake failure debugging

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@diegomrsantos diegomrsantos force-pushed the fix/handshake-connection-closed-race branch from d6194c9 to eac0a0e Compare October 1, 2025 19:47
@diegomrsantos diegomrsantos marked this pull request as ready for review October 1, 2025 23:28
Copy link
Member

@AgeManning AgeManning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm following why this is necessary.

Firstly, Identify at least in my mind is mainly used for debugging. In Lighthouse its more or less optional, I know ssv doesn't really have network specs, but I'm not sure if identify is a necessary required protocol.

I think all libp2p behaviours should work independently of each other. It allows behaviours to be composable. I think by tying the Handshake to Identify you are effectively saying the handshake protocol cannot function without identify. Which I dont think should be the case.

In regards to the actual issue, which seems to be concurrent handshake requests due to concurrent connections. I would have expected that when we dial, we dial a few multiaddrs (if multiple transports are supported). We do this in a single dial, via the dialopts. We then get libp2p to decide which transport we prioritize. I'm not sure why we would be calling dial again for any reason on an already connected peer. We should ensure we don't re-dial connected peers.

When we open a stream, libp2p uses multistream select to determine which protocols are supported. So we can have a connection with a peer, and the handshake can fail if they dont support it. I don't think this is an issue, if the negotation fails, we just drop the peer. We don't need identify to determine this in advance I dont think.

Would it be sufficient just to check num_established in connection_established and the connection direction before sending the handshake?. I.e if num_established = 1 and direction is outbound, then we do the handshake. This would simplify the changes and not rely on identify. Or is there something else I'm missing?

@diegomrsantos diegomrsantos force-pushed the fix/handshake-connection-closed-race branch from 2208363 to c141a85 Compare October 2, 2025 19:40
@diegomrsantos diegomrsantos requested a review from Copilot October 2, 2025 19:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@diegomrsantos diegomrsantos force-pushed the fix/handshake-connection-closed-race branch from 3c4376a to 166809c Compare October 2, 2025 21:12
@diegomrsantos diegomrsantos requested a review from Copilot October 2, 2025 21:35
@diegomrsantos diegomrsantos removed the v1.0.0 First Mainnet-release label Oct 2, 2025
@diegomrsantos diegomrsantos changed the base branch from release-v1.0.0 to unstable October 2, 2025 21:36
@diegomrsantos diegomrsantos changed the base branch from unstable to release-v1.0.0 October 2, 2025 21:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@diegomrsantos diegomrsantos force-pushed the fix/handshake-connection-closed-race branch from 8be966d to b7c427c Compare October 6, 2025 09:41
@diegomrsantos diegomrsantos changed the base branch from release-v1.0.0 to unstable October 6, 2025 09:42
Fixes SSV handshake failures caused by opening protocol streams on
connections that get closed due to concurrent dials or connection
trimming. Observed as OutboundFailure::ConnectionClosed errors.

The fix implements a pending handshake queue that defers handshake
initiation until after the Identify protocol confirms the connection
is stable and supports the SSV handshake protocol.

Key changes:
- Add pending_handshakes queue to defer handshake initiation
- Wait for Identify protocol before initiating handshakes
- Change dial policy from DisconnectedAndNotDialing to NotDialing
- Add 30-second idle connection timeout for handshake completion
- Improve logging to differentiate expected vs unexpected dial failures
- Add comprehensive documentation explaining the race condition
Convert handshake behaviour from type alias to self-contained struct that
automatically initiates handshakes on first outbound connections. This
eliminates the need for external coordination and simplifies the architecture.

Key Changes:

- Replace type alias with proper Behaviour struct wrapping RequestResponseBehaviour
- Implement NetworkBehaviour trait with auto-initiation in on_swarm_event
- Store NodeInfo in the behaviour for automatic handshake initiation
- Use `other_established == 0` check to prevent duplicate initiations during concurrent dials
- Change create_behaviour() to Behaviour::new() (idiomatic Rust constructor)
- Move initiate() to be a method on Behaviour struct
- Remove manual handshake initiation from network.rs
- Add concurrent_dials_both_initiate_handshake test proving the fix works
- Add helper functions for cleaner test structure

Evidence:
The concurrent dials test proves that other_established == 0 prevents duplicate
handshake initiations. Without the check, both peers initiate when they get
Dialer connections (duplicate). With the check, only the first peer initiates.

This makes handshakes fully composable without requiring Identify protocol
coordination or external state management.
Convert standalone handle_event function to a method on Behaviour for better
encapsulation and more idiomatic Rust. Extract test event handling logic into
helper function to eliminate duplication and improve readability.

API Changes:
- handle_event() is now a method: behaviour.handle_event(event)
- handle_request() and handle_response() moved into Behaviour impl as private methods
- Removed unused initiate() method (handshakes are auto-initiated)
- Simplified call sites in network.rs and tests

Test Improvements:
- Added handle_swarm_event_for_test() helper to eliminate 40+ lines of duplication
- Tests are now much more readable and maintainable
- All assertions and tracking logic centralized in one place

Cleanup:
- Removed .with_bandwidth_metrics() call (doesn't exist in current libp2p version)
- Removed unused parameters from test helpers
@dknopik
Copy link
Member

dknopik commented Oct 16, 2025

@diegomrsantos: As Joao noted here #660 the second log line is not necessarily associated with the other two. I've checked the recent logs, and can not really find instances of this issue anymore.

I want to avoid making significant changes to the handshake if they turn out to be motivated by an issue that does not exist.

Nevertheless, during the evolution of this PR it seems to have become a reasonable refactor. However, some changes seem to be unrelated, such as removal of logging. Also, some of the comments seem to still reference the original approach of using the identity protocol. Can you please clean it up a bit so that it is easier to review and more focused on the refactor?

@diegomrsantos
Copy link
Member Author

@diegomrsantos: As Joao noted here #660 the second log line is not necessarily associated with the other two. I've checked the recent logs, and can not really find instances of this issue anymore.

I want to avoid making significant changes to the handshake if they turn out to be motivated by an issue that does not exist.

Nevertheless, during the evolution of this PR it seems to have become a reasonable refactor. However, some changes seem to be unrelated, such as removal of logging. Also, some of the comments seem to still reference the original approach of using the identity protocol. Can you please clean it up a bit so that it is easier to review and more focused on the refactor?

if other_established == 0 is commented out, two handshakes are triggered in concurrent_dials_only_one_handshake

@diegomrsantos diegomrsantos added ready-for-review This PR is ready to be reviewed and removed ready-for-review This PR is ready to be reviewed labels Oct 17, 2025
@diegomrsantos diegomrsantos added ready-for-review This PR is ready to be reviewed and removed ready-for-review This PR is ready to be reviewed labels Oct 17, 2025
@diegomrsantos diegomrsantos marked this pull request as draft October 23, 2025 09:51
@diegomrsantos diegomrsantos marked this pull request as ready for review October 23, 2025 09:51
@claude-code-actions-sigp
Copy link

claude-code-actions-sigp bot commented Oct 23, 2025

Claude finished @diegomrsantos's task —— View job


Code Review Complete

I've conducted a comprehensive review of this PR focusing on code quality, security, performance, testing, and documentation. Here are my findings:


Summary

This PR refactors the handshake behavior to be self-contained and auto-initiating, addressing a potential race condition in concurrent connection scenarios. The implementation follows idiomatic rust-libp2p patterns and includes significant test improvements.

Key Change: The handshake behavior now automatically initiates handshakes on the first outbound connection using the other_established == 0 check, eliminating the need for external coordination.


Critical Issues

1. Race Condition Fix Necessity (Severity: Discussion Needed)

Location: Throughout the PR

As reviewers @AgeManning and @dknopik noted, the original race condition may no longer exist in production. The PR description shows logs that may not be causally related. However, the author demonstrates in concurrent_dials_only_one_handshake test that without the other_established == 0 check, duplicate handshakes would occur.

Recommendation:

  • Clarify whether this is primarily a refactor (main benefit) or a bug fix (secondary benefit)
  • The refactor itself has merit: better encapsulation, cleaner tests, self-contained behavior
  • If positioning as a bug fix, provide concrete evidence from production logs showing the race condition

2. Breaking Change in Public API (Severity: High)

Location: anchor/network/src/handshake/mod.rs:23-42

The PR fundamentally changes the public API:

  • Before: pub type Behaviour = RequestResponseBehaviour<Codec> + handle_event() function
  • After: Custom pub struct Behaviour with different event types
// Old API (removed)
pub type Behaviour = RequestResponseBehaviour<Codec>;
pub type Event = <Behaviour as NetworkBehaviour>::ToSwarm;
pub fn handle_event(...) -> Option<Result<Completed, Failed>>

// New API
pub struct Behaviour { ... }
pub enum Event { Completed {...}, Failed {...} }

Impact: This is a breaking change for any external code using the handshake module.

Recommendation:

  • Document this as a breaking change in commit message and PR description
  • Since this is within the anchor workspace and not a published library, the impact is contained
  • Consider whether gradual migration path is needed

3. Error Handling: Silent Failure (Severity: Medium)

Location: anchor/network/src/handshake/mod.rs:111

let _ = self.inner.send_response(channel, self.node_info.clone());

The response sending can fail, but errors are silently discarded. While this may be acceptable for handshake responses (the peer will timeout), it makes debugging difficult.

Recommendation:

if let Err(e) = self.inner.send_response(channel, self.node_info.clone()) {
    trace!(?peer_id, ?e, "Failed to send handshake response");
}

4. Potential Panic in Production Code (Severity: High)

Location: anchor/network/src/handshake/mod.rs:260

other => {
    return Poll::Ready(
        other.map_out(|_| unreachable!("GenerateEvent already handled")),
    );
}

While the comment explains this should be unreachable, using unreachable!() will panic if the assumption is violated (e.g., due to future libp2p changes).

Recommendation: Consider a more defensive approach:

other => {
    trace!("Bubbling up non-GenerateEvent ToSwarm event");
    return Poll::Ready(
        other.map_out(|_| {
            // This closure should never be called because we've handled all GenerateEvent variants.
            // If it does get called, it indicates a programming error or unexpected libp2p behavior.
            panic!("Unexpected: map_out closure called on non-GenerateEvent. This is a bug.")
        }),
    );
}

Or use a custom error type instead of panicking.


Code Quality Issues

5. Let Chains Syntax (Severity: Info)

Location: anchor/network/src/handshake/mod.rs:133-136

if let libp2p::swarm::FromSwarm::ConnectionEstablished(conn_est) = event
    && let libp2p::core::ConnectedPoint::Dialer { .. } = conn_est.endpoint
    && conn_est.other_established == 0

Let chains require Rust 1.64+. While likely fine for this project, consider documenting minimum Rust version in Cargo.toml if not already present.

6. Clone Performance (Severity: Low)

Location: Multiple locations in handshake/mod.rs

self.inner.send_request(peer_id, self.node_info.clone());

NodeInfo is cloned on every handshake request/response. While NodeInfo is relatively small, consider if Arc<NodeInfo> would be more efficient, especially under high connection load.

Analysis:

  • NodeInfo contains String fields and an Option<NodeMetadata> with more Strings
  • Each clone involves multiple heap allocations
  • Under heavy connection churn, this could add up

Recommendation: Profile first, optimize only if this shows up as a bottleneck.

7. Test Quality Improvement Suggestion

Location: anchor/network/src/handshake/mod.rs:402-408

The concurrent dials test verifies no duplicate handshakes by expecting a timeout on the second drive() call. This is good, but the 100ms timeout is arbitrary and could cause flakiness on slow CI systems.

Recommendation:

// Instead of timing out, explicitly check the event queue is empty
// Or use a shorter timeout (10-50ms) with a clear comment about why
let result = timeout(Duration::from_millis(50), async {
    // This should timeout - no more events should occur
    let ([_local], [_remote]): ([Event; 1], [Event; 1]) =
        drive(&mut local_swarm, &mut remote_swarm).await;
}).await;

assert!(result.is_err(), 
    "Expected no more handshake events (timeout), but got additional events. \
     This indicates duplicate handshakes occurred, violating the other_established==0 check.");

Security Considerations

8. Network Mismatch Handling (Severity: Info)

Location: anchor/network/src/handshake/mod.rs:144-152

The network mismatch detection is good. The code properly rejects connections from peers on different networks. No issues found.

9. DoS Vector: Resource Exhaustion (Severity: Low)

Location: anchor/network/src/handshake/mod.rs:84

events: VecDeque::new(),

Events are buffered in a VecDeque with no size limit. A malicious peer rapidly initiating connections could theoretically cause memory growth.

Analysis:

  • Mitigated by connection limits in ConnectionManager
  • Events are consumed in poll() on each event loop iteration
  • Unlikely to be a real issue in practice

Recommendation: Document the assumption that connection limits prevent unbounded growth, or add a maximum queue size.


Performance Considerations

10. Idle Connection Timeout (Severity: Info)

Location: anchor/network/src/network.rs:754

.with_idle_connection_timeout(Duration::from_secs(30));

Good improvement. The 30-second timeout provides time for handshake completion while still cleaning up idle connections. The comment explains the rationale well.

11. Dial Condition Change (Severity: Important)

Location: anchor/network/src/peer_manager/discovery.rs:216

- .condition(PeerCondition::NotDialing)
+ .condition(PeerCondition::DisconnectedAndNotDialing)

This is a significant behavior change:

  • Before (NotDialing): Could dial already-connected peers (creating multiple connections)
  • After (DisconnectedAndNotDialing): Only dial if both disconnected AND not currently dialing

Impact: This prevents creating duplicate connections from concurrent dials, which complements the other_established == 0 check.

Recommendation: This change should be highlighted in the PR description as it's a critical part of preventing the race condition.


Testing

12. Test Improvements (Severity: Positive)

Location: anchor/network/src/handshake/mod.rs:276-436

Excellent test improvements:

  • ✅ Reduced from 611 to 397 lines (35% reduction)
  • ✅ All tests use drive() for declarative testing
  • ✅ Helper functions eliminate duplication (assert_completed, assert_network_mismatch)
  • ✅ New test for concurrent dials (concurrent_dials_only_one_handshake)
  • ✅ Tests no longer call internal functions - fully black-box

Positive feedback: The test quality is significantly improved.

13. Test Coverage Gap (Severity: Medium)

Location: Test suite

Missing test cases:

  • Rapid connection/disconnection: Does other_established handle reconnection properly?
  • Multiple simultaneous dials: What if 3+ connection attempts happen simultaneously?
  • Handshake timeout: No test for handshake timeout scenarios
  • Connection closed during handshake: What happens if connection drops mid-handshake?

Recommendation: Consider adding these edge case tests, especially for rapid reconnection scenarios.


Documentation

14. Documentation Quality (Severity: Positive)

Location: anchor/network/src/handshake/mod.rs

Good documentation:

  • ✅ Clear doc comments on public items
  • ✅ Explanation of other_established == 0 check
  • ✅ Comments explain the asymmetric dialer-initiates pattern

15. PR Description Clarity (Severity: Info)

The PR description is excellent:

  • ✅ Explains the evolution from Identify-based to self-contained approach
  • ✅ Provides evidence (concurrent dials test)
  • ✅ Explains architecture rationale

Minor suggestion: Add a migration guide for any downstream code using the old handle_event() API.


Architecture Assessment

16. Self-Contained Behavior Pattern (Severity: Positive)

The refactor to make the handshake behavior self-contained is architecturally sound:

  • ✅ Follows rust-libp2p idioms (composable, independent behaviors)
  • ✅ No coupling to Identify protocol
  • ✅ Clean separation of concerns
  • ✅ Events are semantic (Completed/Failed) rather than low-level (RequestResponse events)

This aligns with how Lighthouse and other production rust-libp2p implementations structure their code.

17. Event Processing in poll() (Severity: Good)

The custom poll() implementation that translates RequestResponse events to semantic events is well-designed:

  • ✅ Properly handles all event variants
  • ✅ Queues events for emission
  • ✅ Bubbles up non-GenerateEvent ToSwarm events

Minor Issues

18. Unused Import (Severity: Trivial)

The PR already fixed this (debug import removed from handshake/mod.rs).

19. Logging Consistency (Severity: Low)

Location: anchor/network/src/network.rs:637

The handshake failure logging uses debug! level, which is appropriate. However, consider adding a counter metric for monitoring production handshake success/failure rates.

Note: The code already has metrics in handle_handshake_result() - good!


Recommendations Summary

Must Fix (Before Merge):

  1. Replace unreachable!() with safer error handling (Issue Correct CI #4)
  2. Add trace logging for handshake response send failures (Issue Remove sigp-crates dependency #3)
  3. Clarify PR positioning: refactor vs bug fix (Issue Draft a Testing Framework #1)

Should Fix:
4. Document breaking API changes in PR description (Issue #2)
5. Add test for rapid reconnection scenarios (Issue #13)
6. Highlight dial condition change in PR description (Issue #11)

Nice to Have:
7. Consider Arc<NodeInfo> for clone efficiency (Issue #6)
8. Add event queue size limit documentation (Issue #9)
9. Add migration guide for old API (Issue #15)


Conclusion

This is a well-executed refactor that improves code organization, test quality, and follows rust-libp2p best practices. The self-contained behavior pattern is architecturally sound.

Main concern: The necessity of the race condition fix needs clarification. If this is primarily a refactor (with race prevention as a secondary benefit), that should be clearly stated.

Code quality: High. The implementation is clean, well-tested, and follows Rust idioms.

Recommendation: Approve with requested changes to address the critical issues (unreachable panic, error handling, PR description clarity).


Copy link
Member

@dknopik dknopik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two nits, otherwise good!

Remove redundant Completed and Failed structs that duplicated Event
variant data. Pass Event directly to handlers instead of wrapping in
Result.

This eliminates unnecessary type conversions and follows idiomatic
libp2p patterns for NetworkBehaviour events.
Re-add structured logging for connection errors that was removed in a
previous refactoring. These errors are distinct from routine connection
events:

- OutgoingConnectionError: Failed to establish outgoing connection
- IncomingConnectionError: Failed to accept incoming connection
- ConnectionClosed with cause: Established connection failed at runtime

Log errors at debug level and clean closures at trace level following
libp2p's own logging patterns.
Replace silent failure handling with trace-level logging when handshake
response cannot be sent. This occurs when the response channel is
already closed (peer disconnected), which is not an error condition but
useful for debugging handshake flows.

Uses trace level as this is an expected race condition that doesn't
require operator attention.
@diegomrsantos diegomrsantos requested a review from dknopik November 3, 2025 14:36
Copy link
Member

@dknopik dknopik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@dknopik dknopik added ready-for-merge and removed ready-for-review This PR is ready to be reviewed labels Nov 3, 2025
@mergify mergify bot merged commit e2475f4 into sigp:unstable Nov 3, 2025
17 checks passed
@diegomrsantos diegomrsantos deleted the fix/handshake-connection-closed-race branch November 3, 2025 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working network ready-for-merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants